Interpretive Risk Assessment on GWA Data with Sparse Linear Regression
نویسندگان
چکیده
Genome-wide association (GWA) studies provide large amounts of high-dimensional data. GWA studies aim to identify variables, i.e., single nucleotide polymorphisms (SNP) that increase the risk for a given phenotype and have been successful in identifying susceptibility loci for several complex diseases. A remaining challenge is however to predict the individual risk based on the genetic pattern. Counting the number of unfavorable alleles is a standard approach to estimate the risk of a disease. However this approach limits the risk prediction by only allowing for a subset of predefined SNPs. Recent studies that apply SVM-learning have been successful in improving the risk prediction for Type I and II diabetes. However, a drawback of the SVM is the poor interpretability of the classifier. The aim is thus to classify based on only a small number of SNPs in order to also allow for a genetic interpretability of the resulting classifier. In this work we propose an algorithm that can do exactly this. We use an approximation method for sparse linear regression problems that has been recently proposed and can be applied to large data sets in order to search for the best sparse risk predicting pattern among the complete set of SNPs.
منابع مشابه
Robust Estimation in Linear Regression with Molticollinearity and Sparse Models
One of the factors affecting the statistical analysis of the data is the presence of outliers. The methods which are not affected by the outliers are called robust methods. Robust regression methods are robust estimation methods of regression model parameters in the presence of outliers. Besides outliers, the linear dependency of regressor variables, which is called multicollinearity...
متن کاملSparse Coding for Feature Selection on Genome-Wide Association Data
Genome-wide association (GWA) studies provide large amounts of high-dimensional data. GWA studies aim to identify variables that increase the risk for a given phenotype. Univariate examinations have provided some insights, but it appears that most diseases are affected by interactions of multiple factors, which can only be identified through a multivariate analysis. However, multivariate analys...
متن کاملIdentifying Genetic Risk Factors via Sparse Group Lasso with Group Graph Structure
Genome-wide association studies (GWA studies or GWAS) investigate the relationships between genetic variants such as single-nucleotide polymorphisms (SNPs) and individual traits. Recently, incorporating biological priors together with machine learning methods in GWA studies has attracted increasing attention. However, in real-world, nucleotide-level bio-priors have not been well-studied to date...
متن کاملSNPboost: Interaction Analysis and Risk Prediction on GWA Data
Genome-wide association (GWA) studies, which typically aim to identify single nucleotide polymorphisms (SNPs) associated with a disease, yield large amounts of high-dimensional data. GWA studies have been successful in identifying single SNPs associated with complex diseases. However, so far, most of the identified associations do only have a limited impact on risk prediction. Recent studies ap...
متن کاملOriginal Articles Prediction of Cardiovascular Disease Outcomes and Established Cardiovascular Risk Factors by Genome-Wide Association Markers
Background—Genome-wide association (GWA) platforms have yielded a rapidly increasing number of new genetic markers. The ability of these markers to improve prediction of clinically important outcomes is debated. Methods and Results—A systematic review was performed of GWA-derived markers associated with cardiovascular outcomes or other phenotypes that represent common established risk factors f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010